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ABSTRACT 

Human memory has been thoroughly studied and modeled 
in psychology, but mainly in laboratory setting under simpli- 
fied conditions. For application in practical adaptive educa- 
tional systems we need simple and robust models which can 
cope with aspects like varied prior knowledge or multiple- 
choice questions. We discuss and evaluate several models of 
this type. We show that using the extensive data sets col- 
lected by online educational systems it is possible to build 
well calibrated models and get interesting insight, which can 
be used for improvement of adaptive educational systems. 

1. INTRODUCTION 

Development of intelligent tutoring system and other adap- 
tive educational systems is often focused on teaching math- 
ematics, physics, and similar domains. The related research 
in student modeling is thus concerned mainly with model- 
ing skill acquisition. Another interesting area, where adapt- 
ability is very useful, is learning of facts [8], particularly in 
domains with varied prior knowledge like vocabulary, ge- 
ography, or human anatomy. In this context, modeling of 
students’ memory is important. 

Principles of human memory and their consequences for ed- 
ucation have been extensively studied in psychology, e.g., [2, 
5, 9, 10]. Models developed in the psychological research are 
not, however, easily applicable in practical implementation 
of adaptive practice. The purpose of models described in 
psychological literature is to describe and explain mecha- 
nisms of human memory, e.g., the spacing effect [9]. Experi- 
ments are done using lab studies under controlled setting, in 
areas with little prior knowledge, e.g., learning of arbitrary 
word lists, nonsense syllables, obscure facts, or Japanese vo- 
cabulary. 

In the context of development of adaptive educational sys- 
tems, our goal is more pragmatic - we do not need to capture 
all details of human memory, we need a model which will 
work well in an adaptive system. A model needs to provide 
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good input for other modules of an adaptive system (e.g., 
question selection or open learner model). The specific con- 
text of our work is an adaptive application slepemapy.cz 
for learning geography [8]. 

Although we can afford to model memory in a simplified 
manner, we have to deal with issues like varied prior knowl- 
edge, multiple-choice questions (with possibility of guess- 
ing), and no control on when students use the system. Com- 
pared to laboratory studies online educational systems can 
easily collect much more extensive data (millions of answers) , 
so we can employ machine learning techniques to find fitting 
models. Specifically, in our work we use this approach to 
detect the dependence of memory activation on time from 
previous answer. The standard approach [9] is to make an 
assumption about the functional form of such dependence. 
We learn the function from the data and it turns out to be 
an S-shaped function which cannot be represented symbol- 
ically in a straightforward way. The results also show that 
there are large differences between learning of facts even in 
a seemingly compact domain like geography. These results 
may be useful for improving the behaviour of adaptive edu- 
cational systems. 

2. MODELING 

Before we go into the description of models, let us clarify 
the context of considered models. In previous work [8] we 
described a modular architecture for an adaptive practice of 
facts based on three modules: estimation of prior knowledge, 
estimation of current knowledge, construction of questions. 
Here we focus on improving the estimation of current knowl- 
edge by taking timing between answer into account. 

Specifically, we assume the following input: for each stu- 
dent and repeatedly answered fact (e.g., a country in the 
case of our application), we have an initial estimate of the 
student’s knowledge of the fact and data about a sequence 
of student’s answers. For each answer we consider the cor- 
rectness of the answer, the type of question (either open 
question or multiple-choice question with a specified num- 
ber of options), and time from previous answer (in seconds). 
For estimating initial activation we use a variant of the Elo 
rating system [4, 13] as specified in [8]. For purpose of this 
work this estimation is treated as a black box. 

As an output a model provides estimated probability that 
the next answer will be correct. This output can be used for 
the adaptive construction of questions (in such a way that 
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they have appropriate difficulty) [7, 8]. Model parameters 
can be also used for presenting feedback to students in the 
form of an open learner model. 

2.1 Basic Approach 

Student models of learning [3] most commonly use either a 
binary skill (a typical model of this type is Bayesian Knowl- 
edge Tracing) or a continuous skill with probability of cor- 
rect answer specified by the logistic function of the skill. 
For modeling memory it is natural to use a continuous skill 
since memory is build gradually - as opposed, for exam- 
ple, to understanding or insight in mathematics, which may 
undergo sudden transition from unlearned to learned state 
as assumed by Bayesian Knowledge Tracing [1]. Modeling 
based on the logistic function was also previously used for 
modeling memory [9]. In the following we use the notion of 
memory activation instead of skill. 



Time from previous attempt (seconds) 


All models that we consider have the following basic form. 
Based on the data we estimate memory activation m. Prob- 
ability that the next answer will be correct is estimated 
using a logistic function: P{m) = ■ In ths case 

of multiple-choice question with n options the probability 
of correct answer is given by the shifted logistic function: 
P(m) = - -I- (1 — -) , , . Note that this functional form 

is a simplihcation, since it does not consider the possibility 
that a student answers correctly by ruling out distractors. 

2.2 Computing Memory Activation 

A basic model applicable under the outlined approach is 
a simplified, one-dimensional variant of Performance Factor 
Analysis (PFA) [11] (originally PFA was formulated in terms 
of skills and vectors, as it uses multiple knowledge compo- 
nents). In this model the memory activation is given by a 
linear combination of an initial activation and past successes 
and failures of a student: m = /3 + ys + Sf, where j3 is the 
initial activation, s and / are counts of previous successes 
and failures of the student, 7 and 5 are parameters that de- 
termine the change of the skill associated with correct and 
incorrect answers. The basic disadvantage of this simple 
approach is that it does not consider the time between at- 
tempts; in fact it even ignores the order of answers (it uses 
only the summary number of correct and incorrect answers). 

ACT-R model [9, 12] of spacing effects can be considered as 
an extension of this basic model. In this model the mem- 
ory activation is estimated as m = /? -|- log(X) bit~'^'), where 
the sum is over all previous attempts, values ti are the ages 
of previous attempts, values hi capture the influence of cor- 
rectness of answers, dt is the decay rate, which is computed 
by recursive equations [9]. The model also includes addi- 
tional modihers for treating time between sessions. The fo- 
cus of the model is on modeling the decay rate to capture the 
spacing effect. Studies using this model [9, 12] did not take 
into account the probability of guessing and variable initial 
knowledge of different items (initial activation was either a 
global constant or a student parameter). In the current work 
we focus on these factors and for the moment omit modeling 
of spacing effects. 

Another possible extension [8] of the basic PFA model is to 
combine it with some aspects of the Elo rating system [4, 
13]; in the following we denote this version as PFAE (PFA 


Figure 1: Calibration for the PFAE model with dif- 
ferent time effect functions — the y axis shows differ- 
ence between observed frequency of correct answers 
and average prediction. 

Elo/Extended). The estimated memory activation is up- 
dated after each answer as follows: 

m + 7 • (1 — P{m)) if the answer was correct 
m + 3 ■ P{m) if the answer was incorrect 

To include the timing information into this model, we can 
locally increase the memory activation for the purpose of 
prediction, i.e., instead of P{m) to use P{m + f{t)), where t 
is the time (in seconds) from the last attempt and / is a time 
effect function. As m denotes memory activation, the value 
f{t) corresponds to temporal increase in memory activation 
due to (short) time from previous exposure of an item. 

It is natural to use as a time effect function some simple 
analytic function, but analysis of our data suggests that this 
approach does not work well. Figure 1 shows calibration 
analysis for two time effect functions: f{t) = ^ (used in 
previous work [8]) and f{t) — 1.6 — 0.1 log(t) (the functional 
form is based on [9] and fitted to data). We see that neither 
of these functions leads to well calibrated predictions. Since 
we were not able to find a simple time effect function that 
would provide a good fit, we represent the function f{t) as 
a staircase function with Hxed bounds b and values v which 
we learn from the data: 

3. EXPERIMENTS 

We report experiments with the PFAE model with time ef- 
fect function. For evaluation we used data from an online 
system for practicing geography [8] (slepemapy.cz). Data 
were filtered to include only students with at least 20 an- 
swers, items (places) with at least 40 answers, and we con- 
sider only sequences where a student answered at least 3 
questions about an item. For experiments we divided the 
data into 10 sets, each containing 52,190 sequences of an- 
swers. 


if 6i < t < bi+i 
otherwise 


Proceedings of the 8th International Conference on Educational Data Mining 


481 



Figure 2: Time effect function — average from 10 
independent data sets, error bars show standard de- 
viations of parameter estimates. 


3.1 Model Parameters 

As the fixed bounds used in the staircase representation of 
time effect function we have chosen the following values: 0, 
60, 90, 150, 300, 600, 1800, 10800, 86400, 259200, 2592000. 
These values were chosen to be easily interpretable (e.g., 30 
minutes, 1 day) and at the same time to have reasonably 
even distribution of data into individual bins. 

The model has the following parameters which we need to 
estimate from the data: update constants 7, <5 and the vec- 
tor V representing the time effect function. To estimate these 
parameters we use a gradient descent. To evaluate stability 
of parameter estimates we computed the parameter values 
for the 10 independent data sets. The results show that 
the obtained parameters are very stable: 7 = 2.290 ± 0.042, 
5 = —0.917 ± 0.018; values v for the representation of time 
effect function are depicted in Figure 2. 

Since our data set is large and parameter estimates are sta- 
ble, we can afford to do more detailed analysis. Figure 3 
shows fitted time effect functions and 7, 5 values when the 
parameters are fitted using only part of the data. Figure 3 A 
shows that there is quite large difference between parameter 
values for cases with high and low prior knowledge. This 
suggests possible improvement to the PFAE model - not 
just by including more parameters, but also by changing its 
functional form. However, prior knowledge is not the only 
factor that plays role. Figure 3 B shows fitted parameters for 
several types of places. In all of these cases the prior knowl- 
edge is low, yet there are still large differences between fitted 
parameters values. These parameters may contain useful in- 
formation about students’ learning in particular parts of the 
domain, e.g., data in Figure 3 B illustrate that it is easier 
to learn states of Germany than provinces of China. 

In the case of countries we have enough data to perform pa- 
rameter fitting for individual places. In this case we fix the 
time effect function (as learned on the whole data set and re- 
ported in Figure 2) and we learn only the 7, 5 parameters on 
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Figure 3: Time effect function and 7, 5 parameters 
fitted to filtered data: A) by estimated prior knowl- 
edge, B) by the type of a place. 

data for a single place. We use only places for which we have 
at least 1300 students answering at least 3 questions. The 
fitted parameter 7 is has an interpretable meaning “how easy 
it is to remember a country”. Examples of countries with 
high 7 (>3.3): Western Sahara, Southern Sudan, Vietnam, 
Egypt, Somalia; countries with low 7 (< 1.7): Bulgaria, Ro- 
mania, Serbia, Moldova. Note that the reported results are 
clearly dependent on the origin of students using the system 
- in our case mostly Czech students. 

3.2 Accuracy of Predictions 

Table 1 show comparison of several model variants with re- 
spect to three common performance metrics [14]: root mean 
square error (RMSE), log- likelihood (LL), and area under 
the ROC curve (AUC). The results show averages from 10 
runs on different training/testing sets. The results are con- 
sistent over the three metrics and show that the PFAE mod- 
els brings quite large improvement over the PFA model. Dif- 
ferences between variants of the PFAE model due to the used 
time effect function are statistically significant, but other- 
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Table 1: Comparison of models with respect to three 
performance metrics. 


model 

time effect 

RMSE 

LL 

AUC 

PFA 

- 

0.3593 

-106517 

0.719 

PFA 

80/t 

0.353 

-103441 

0.7195 

PFAE 

80/t 

0.3377 

-94454 

0.757 

PFAE 

1.6-0.11og(t) 

0.3367 

-93987 

0.7591 

PFAE 

staircase 

0.3363 

-93642 

0.7614 


wise rather small. Individual predictions are actually highly 
correlated (correlation coefficient around 0.97). 

4. DISCUSSION 

We have evaluated several variants of a model of memory 
activation in the context of adaptive practice of facts. We 
proposed a model which incorporates the effect of time from 
previous answer by a general staircase function, which is 
learned from data (as opposed to assuming a specific sym- 
bolic form of the function). The model is better calibrated 
than other studied models and provides slightly better pre- 
dictions. More importantly, the model is simple, parameters 
are easy to learn from data and robust. The learned func- 
tion also provides interesting insight into students memory 
in the particular application - there is fast decrease in mem- 
ory activation within the first 10 minutes, then the effect is 
nearly steady for 1 day, after that the activation decreases 
again. 

By performing fine-grained analysis of the data, it is pos- 
sible to use the model parameters to determine items that 
are easy or difficult to remember. Such results may be use- 
ful for improvement of educational systems, e.g., by offer- 
ing mnemonics for difficult to remember facts, or by chang- 
ing the adaptive selection of questions to prefer easy to re- 
member facts at the beginning of a session. Specifically, 
results reported in Figure 3 suggest that different adaptive 
behaviour may be useful for learning African countries and 
provinces of China. 

A possible limitation of this study is that the used data do 
not come from a properly designed and controlled experi- 
ment, but from an adaptive system which uses a student 
model to choose questions [8]. This may potentially cause a 
bias in the performed analysis. Although it seems unlikely 
that the reported results would be significantly influenced 
by this data source, feedback loops between student models 
and data collection deserve attention [6]. 

Another simplification of the current work is that we do not 
consider the feedback provided by the used system when a 
student answers incorrectly. This feedback clearly has im- 
pact on memory activation of the selected wrong answer. 
This raises a more general question: What is more impor- 
tant for the practical development of adaptive educational 
systems - proper treatment of principal issues (e.g., spacing 
effect) or incorporation of practical features into the model 
(e.g., effect of wrong answers)? 
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